Goto

Collaborating Authors

 defence mechanism


Representation Noising: A Defence Mechanism Against Harmful Finetuning

Neural Information Processing Systems

Releasing open-source large language models (LLMs) presents a dual-use risk since bad actors can easily fine-tune these models for harmful purposes. Even without the open release of weights, weight stealing and fine-tuning APIs make closed models vulnerable to harmful fine-tuning attacks (HFAs). While safety measures like preventing jailbreaks and improving safety guardrails are important, such measures can easily be reversed through fine-tuning. In this work, we propose Representation Noising (\textsf{\small RepNoise}), a defence mechanism that operates even when attackers have access to the weights. Importantly, our defence is also able to generalize across different subsets of harm that have not been seen during the defence process as long as they are drawn from the same distribution of the attack set.


A Study on Prompt Injection Attack Against LLM-Integrated Mobile Robotic Systems

arXiv.org Artificial Intelligence

The integration of Large Language Models (LLMs) like GPT-4o into robotic systems represents a significant advancement in embodied artificial intelligence. These models can process multi-modal prompts, enabling them to generate more context-aware responses. However, this integration is not without challenges. One of the primary concerns is the potential security risks associated with using LLMs in robotic navigation tasks. These tasks require precise and reliable responses to ensure safe and effective operation. Multi-modal prompts, while enhancing the robot's understanding, also introduce complexities that can be exploited maliciously. For instance, adversarial inputs designed to mislead the model can lead to incorrect or dangerous navigational decisions. This study investigates the impact of prompt injections on mobile robot performance in LLM-integrated systems and explores secure prompt strategies to mitigate these risks. Our findings demonstrate a substantial overall improvement of approximately 30.8% in both attack detection and system performance with the implementation of robust defence mechanisms, highlighting their critical role in enhancing security and reliability in mission-oriented tasks.


Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning

arXiv.org Artificial Intelligence

Federated learning (FL) has been introduced to enable a large number of clients, possibly mobile devices, to collaborate on generating a generalized machine learning model thanks to utilizing a larger number of local samples without sharing to offer certain privacy to collaborating clients. However, due to the participation of a large number of clients, it is often difficult to profile and verify each client, which leads to a security threat that malicious participants may hamper the accuracy of the trained model by conveying poisoned models during the training. Hence, the aggregation framework at the parameter server also needs to minimize the detrimental effects of these malicious clients. A plethora of attack and defence strategies have been analyzed in the literature. However, often the Byzantine problem is analyzed solely from the outlier detection perspective, being oblivious to the topology of neural networks (NNs). In the scope of this work, we argue that by extracting certain side information specific to the NN topology, one can design stronger attacks. Hence, inspired by the sparse neural networks, we introduce a hybrid sparse Byzantine attack that is composed of two parts: one exhibiting a sparse nature and attacking only certain NN locations with higher sensitivity, and the other being more silent but accumulating over time, where each ideally targets a different type of defence mechanism, and together they form a strong but imperceptible attack. Finally, we show through extensive simulations that the proposed hybrid Byzantine attack is effective against 8 different defence methods.


NeuralSentinel: Safeguarding Neural Network Reliability and Trustworthiness

arXiv.org Artificial Intelligence

The usage of Artificial Intelligence (AI) systems has increased exponentially, thanks to their ability to reduce the amount of data to be analyzed, the user efforts and preserving a high rate of accuracy. However, introducing this new element in the loop has converted them into attacked points that can compromise the reliability of the systems. This new scenario has raised crucial challenges regarding the reliability and trustworthiness of the AI models, as well as about the uncertainties in their response decisions, becoming even more crucial when applied in critical domains such as healthcare, chemical, electrical plants, etc. To contain these issues, in this paper, we present NeuralSentinel (NS), a tool able to validate the reliability and trustworthiness of AI models. This tool combines attack and defence strategies and explainability concepts to stress an AI model and help non-expert staff increase their confidence in this new system by understanding the model decisions. NS provide a simple and easy-to-use interface for helping humans in the loop dealing with all the needed information. This tool was deployed and used in a Hackathon event to evaluate the reliability of a skin cancer image detector. During the event, experts and non-experts attacked and defended the detector, learning which factors were the most important for model misclassification and which techniques were the most efficient. The event was also used to detect NS's limitations and gather feedback for further improvements.


How do we keep AI safe from adversaries?

#artificialintelligence

In the era of Artificial Intelligence, there are several security challenges to keep the machine learning model secure from adversaries. The goal of this paper is to find the solutions to keep AI safe from adversaries. The focus will be on the techniques to defence the adversaries using multi-step approaches. I will begin by explaining what is adversarial in AI and what are the intentions. Then I will explain the taxonomy of it along with strategies.


Adversarial Detection and Correction by Matching Prediction Distributions

arXiv.org Machine Learning

We present a novel adversarial detection and correction method for machine learning classifiers.The detector consists of an autoencoder trained with a custom loss function based on the Kullback-Leibler divergence between the classifier predictions on the original and reconstructed instances.The method is unsupervised, easy to train and does not require any knowledge about the underlying attack. The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST, and remains very effective on CIFAR-10 when the attack is granted full access to the classification model but not the defence. We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence and investigate the robustness of the attack. The method is very flexible and can also be used to detect common data corruptions and perturbations which negatively impact the model performance. We illustrate this capability on the CIFAR-10-C dataset.


Cyber Security Research Centre, Data61, Penten join forces to build AI-enabled defence systems ZDNet

#artificialintelligence

Cyber Security Cooperative Research Centre (CSCRC), together with Data61, the innovation arm of the Commonwealth Scientific and Industrial Research Organisation (CSIRO), and cybersecurity startup Penten, have announced a joint research project that will focus on developing artificial intelligence (AI) enabled cybersecurity defence mechanisms. Under the arrangement announced at D61 Live on Wednesday, Penten will have access to Data61's AI research, which it will use to extend on its existing work to build AI-enabled technology such as "cyber traps" and "decoys". According to Penten CEO Matthew Wilson, using AI will help speed up the creation of cyber traps and make them more realistic. "Our solutions use artificial intelligence to learn the patterns of activity and content from surrounding computers and data. We then use this information to create realistic and believable mimics. This means we can deliver suitable content extremely efficiently, tailored to a customer environment and with minimal effort on the part of the defender," he said.


An Empirical Evaluation of Adversarial Robustness under Transfer Learning

arXiv.org Machine Learning

In this work, we evaluate adversarial robustness in the context of transfer learning from a source trained on CIFAR 100 to a target network trained on CIFAR 10. Specifically, we study the effects of using robust optimisation in the source and target networks. This allows us to identify transfer learning strategies under which adversarial defences are successfully retained, in addition to revealing potential vulnerabilities. We study the extent to which features learnt by a fast gradient sign method (FGSM) and its iterative alternative (PGD) can preserve their defence properties against black and white-box attacks under three different transfer learning strategies. We find that using PGD examples during training on the source task leads to more general robust features that are easier to transfer. Furthermore, under successful transfer, it achieves 5.2% more accuracy against white-box PGD attacks than suitable baselines. Overall, our empirical evaluations give insights on how well adversarial robustness under transfer learning can generalise.


Bridging machine learning and cryptography in defence against adversarial attacks

arXiv.org Machine Learning

In the last decade, deep learning algorithms have become very popular thanks to the achieved performance in many machine learning and computer vision tasks. However, most of the deep learning architectures are vulnerable to so called adversarial examples. This questions the security of deep neural networks (DNN) for many security- and trust-sensitive domains. The majority of the proposed existing adversarial attacks are based on the differentiability of the DNN cost function.Defence strategies are mostly based on machine learning and signal processing principles that either try to detect-reject or filter out the adversarial perturbations and completely neglect the classical cryptographic component in the defence. In this work, we propose a new defence mechanism based on the second Kerckhoffs's cryptographic principle which states that the defence and classification algorithm are supposed to be known, but not the key. To be compliant with the assumption that the attacker does not have access to the secret key, we will primarily focus on a gray-box scenario and do not address a white-box one. More particularly, we assume that the attacker does not have direct access to the secret block, but (a) he completely knows the system architecture, (b) he has access to the data used for training and testing and (c) he can observe the output of the classifier for each given input. We show empirically that our system is efficient against most famous state-of-the-art attacks in black-box and gray-box scenarios.


Interview: Cognizant takes thought leadership to AI in security

#artificialintelligence

Cybersecurity and artificial intelligence now seem to go hand in hand as complementary weapons against crime, but what lies ahead? Manish Bahl is the senior director for Cognizant's Centre for the Future of Work. He is responsible for thought leadership around code halos, digital transformation and the future of work. He works with IT and business decision-makers to provide vision on digital transformation and its effects on business, people, technology and culture. "Cognizant's Centre for the Future of Work is a dynamic thought leadership powerhouse that examines how work is changing, and will change, in response to the emergence of new technologies, new business practices, and new workers," he explains.